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(54) Fast DCT inverse motion compensation 

(57) Downsampling and inverse motion compensa- 
tion are performed on compressed domain representa- 
tions for video. By directly manipulating the compressed 
domain representation instead of the spatial domain 
representation, computational complexity is significantly 
reduced. For downsampling, the compressed stream is 
processed in the compressed (DCT) domain without 
explicit decompression and spatial domain downsam- 
pling so that the resulting compressed stream corre- 
sponds to a scaled down image, ensuring that the 
resulting compressed stream conforms to the standard 
syntax of 8 x 8 DCT matrices. For typical data sets, this 
approach of downsampling in the compressed domain 
results in computation savings around 80 % compared 



with traditional spatial domain methods for downsam- 
pling from compressed data. For inverse motion com- 
pensation, motion compensated compressed video is 
converted into a sequence of DCT domain blocks corre- 
sponding to the spatial domain blocks in the current pic- 
ture alone. By performing inverse motion compensation 
directly in the compressed domain, the reduction in 
computation complexity is around 68 % compared with 
traditional spatial domain methods for inverse motion 
compensation from compressed data. The techniques 
for downsampling and inverse motion compensation 
can be used in a variety of applications, such as 
multipoint video conferencing and video editing. 
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Description 

BACKGROUND OF THE INVENTION 
5 TECHNICAL FIELD 

The invention relates to data compression. More particularly, the invention relates to performing such operations as 
downsampling and inverse motion compensation directly on compressed domain representations. 

10 DESCRIPTION OF THE PRIOR ART 

Many image and video processing applications require real time manipulation of digital image or video data to 
implement composition and special effects, e.g. downsampling (zooming in or out), modifying contrast and brightness, 
translating, filtering, masking, rotation, and inverse motion compensation. Real time manipulation of the image and 
is video data may be problematic since in many instances the data is available only in compressed form. Typically, the 
image and video data may have been compressed according to one of the compression standards such as JPEG, 
MPEG-1, MPEG-2, H.261 or H.263 (hereinafter, collectively "compression standard" or equivalent). 

The traditional approach to dealing with compressed domain data is to first decompress the data to obtain a spatial 
domain representation and then apply the desired image or video manipulation techniques for the desired composition 
20 or special effect and then compress the manipulated data so that the resulting bitstream conforms to the compression 
standard. 

Fig. 1 is a flow chart of the traditional approach to manipulation of image data. Initially, an image is stored on a disk 
112. The image is stored in a compressed format using one of any number of industry standard compression schemes 
in order to reduce the amount of memory space required to store the image. Many of these compression schemes use 
25 the so-called discrete cosine transform (DCT) to convert the original image data from the spatial domain to the com- 
pressed domain. The 8x8 2D-DCT transform converts a block (x(n,m)} in the spatial domain into a corresponding 
matrix of frequency components {X(k,l)f according to the following equation: 
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where c(0) = 1/V2 and c[k) = 1 for k > 0 . 
35 The conventional approach does not operate on the compressed data. Instead, the image data is transformed from 
the compressed domain back to the spatial domain in step 1 14. If the compression scheme uses the DCT, the decom- 
pression scheme uses the inverse DCT transform that is given by the following equation: 

= E E m^X(k,DcoJ( ^Hcosf 

k*o ho 2 2 { 16 ) \ 16 } 



45 Once the data is returned to the spatial domain, conventional image manipulation techniques can be used in step 
1 16 to produce a desired image. For inverse motion compensation, step 1 16 entails using conventional inverse motion 
compensation techniques. The processed data is then compressed again in step 118 using the same compression 
scheme and stored back to disk 120. Although disks 112 and 120 are shown separately, they can in fact be one and the 
same. 

so The traditional approach is cumbersome due to (1) the high computational complexity of the decompression and 
compression tasks, and (2) the large volume of spatial domain data that has to be manipulated. Thus traditional 
approaches may not be feasible in many practical applications. 

For this reason there has been a great effort in recent years to develop fast algorithms that perform these tasks 
directly in the compressed domain and thereby avoid the need of decompression. See. for example S.F. Chang and D. 

55 G. Messerschmitt. Manipulation and Compositing of MC-DCT Compressed Video. IEEE journal on Selected Areas of 
Communications, Vol. 13, No. 1, pp. 1-11, 1994; S. F. Shang, D. G. Messerschmitt, A New Approach to Decoding and 
Compositing Motion-Compensated DCT Based Images, Proc. ICASSP '93, Minneapolis, April 1993; W Kou, T. Fjal- 
brant, A Direct Computation of DCT Coefficients for a Signal Block Taken from Two Adjacent Blocks, IEEE Trans. Signal 
Proc, Vol. SP-39, pp. 1692-1695, July 1991; J. B. Lee, B. G. Lee, Transform Domain Filtering Based on Pipelining 
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Structure, IEEE Trans. Signal Proc./ Vol. SP-40, pp. 2061-2064, August 1992. 

Video conferencing provides a suitable example for image manipulation. Consider a video conferencing session of 
several parties, where each party can see all other parties in separate windows on the screen of the party 's workstation. 
It is desirable that every user have the flexibility to resize windows, move the windows from one location on the screen 
5 to another, and so on. Due to the limited computation capabilities within a workstation, it may be possible to efficiently 
handle only one video stream. In this scenario, it is still possible to have multiparty videoconferencing by having the 
video streams from all of the parties be first sent to a network server. The server composes the streams from all parties 
into a single stream. 

The traditional approach requires that all compressed video streams are first decompressed at the server, then the 

?o desired change is translated into a suitable arithmetic operation on the decompressed video streams with the appropri- 
ate composition into a single stream, and finally, the composite stream is compressed again and sent to the user. Note 
that the video streams that are input to the server and the composite stream that is output by the server may have to 
conform to a compression standard. In standards compliant environments, a great deal of the computation load in the 
server is in the computation of the inverse discrete cosine transform (IDCT) during the decompression process and the 

15 computation of the DCT during the compression process. Because a great deal of the computational load is in the DCT 
and IDCT operations, it would be advantageous to generate the composite stream directly in the DCT domain. It is 
therefore desireable to have a server in which the IDCT, stream composition and the DCT functions can be efficiently 
combined to yield the desired standards compliant single composite stream. Such a server would have the advantage 
of being less complex than a server that implements the traditional image stream composition. An additional advantage 

20 of such a server would be that this manner of stream composition would not put a heavy burden on user workstations 
because the user workstation now has to deal with only a single composite stream and thus needs to perform only a 
single decompression process. An added benefit is that the communications between the server and the user's work- 
station does not require very high communication bandwidth resources. 

Another application which benefits from image manipulations in the compressed domain is an image kiosk and the 

25 delivery of images from the image kiosk to the home. Typically when a user connects to the image kiosk over a network, 
prior to obtaining the desired image, the user may want to browse through a catalog of images deliverable from the 
kiosk. For the purposes of image browsing, it may be adequate to provide the user a downsampled version of the image 
in compresed form, e..g. if the original image is of size 640 by 480 and is stored in the image kiosk according to the 
JPEG compression format, it may be sufficient to deliver to the user a JPEG compressed image which is the corre- 

30 spending downsampling-by-two image of size 320 by 240. The problem that occurs when downsampling a compressed 
image conforming to the DCT domain based compression standards is that to yield another compressed image, it is 
necessary to take four compressed data blocks and produce a single compressed block, i.e. to achieve one-half reso- 
lution, it is necessary to reduce a set of compressed data blocks in both the horizontal (X) and in the vertical ( Y) direc- 
tions (two compressed data blocks x two compressed data blocks = four compressed data blocks). Similarly, when 

35 downsampling by three, it is necessary to combine nine compressed data blocks, i.e. three compressed data blocks 
each in the X and Y directions, to produce one scaled, compressed data block. 

For those standards discussed above that rely upon DCT processing, one is constrained to an 8 by 8 DCT block 
geometry, such that the downsampled output must also be an 8 by 8 DCT block geometry so that the compressed bit- 
stream can be decoded by any generic decompressor capable of handling JPEG, MPEG-1, MPEG-2, H.261 or H.263 

40 compressed bitstream. Note that there are known methods for image downsampling given 8 by 8 DCT block geometry 
by simply performing a 4 by 4 IDCT to yield a downsampling-by-two image. This approach does not preserve the orig- 
inal 8 by 8 DCT format, although most industry standard hardware and software, for example for the JPEG, MPEG-1 , 
MPEG-2, H.261 and H.263 standards, require that the data be maintained in the standard 8 by 8 DCT format. Thus, 
while such an approach does offer the capability of processing data directly in the conpressed domain without convert- 

45 ing the data back to an uncompressed form, those algorithms that have been developed for downsampling directly on 
compressed domain representations produce data outputs that are not compliant with industry standard hardware and 
software configurations. 

A second problem, especially with regard to DCT-based data compression schemes, is concerned with the concept 
of inverse motion compensation. Video is best thought of as a sequence of pictures. With regard to compressed video, 

so the first picture is typically considered an anchor picture that is processed with a compression scheme, such as defined 
by the MPEG standard. The second picture in the sequence exploits the fact that in video there is a strong correlation 
between successive pictures and therefore full information for the second picture is not needed. Rather, the compres- 
sion scheme sends information that corresponds to the difference between the two pictures. The process of computing 
the difference is known as motion compensation, and the difference picture is typically referred to as a predictive picture 

55 or a motion compensated picture. Such a scheme achieves considerable data compression because it removes any 
information that is redundant over time. 

The problem with such schemes occurs, for example when using a video editor. If the anchor picture is removed, 
then all subsequent pictures lose their context, because they were all dependent on the anchor picture. Therefore, it is 
not presently practical to perform video editing out of sequence, e.g. in the compressed domain. It is therefore neces- 
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sary to take the compressed stream where the dependencies between each picture are contained in the bit stream, and 
then remove that dependency between the pictures. This process is known as inverse motion compensation. One way 
to remove this dependency is to completely decompress the bit stream, which yields each of the pictures, one after the 
other, such that they can be viewed as pictures. That is, each of the pictures is returned to its original state. The pictures 
5 now can be edited and then may have to be compressed again for storage or transmission. Unfortunately, the process 
of decompression followed by the desired editing function and re-compression is a very expensive proposition. 

Thus, rather than performing image and video processing, such as downsampling and/or inverse motion compen- 
sation, in the uncompressed domain, it would be advantageous to perform such processing in the compressed domain 
by directly manipulating the DCT domain representation of the data 

10 

SUMMARY OF THE INVENTION 

The invention provides schemes that perform downsampling and inverse motion compensation on compressed 
domain representations. By directly manipulating the compressed domain representation instead of the spatial domain 
15 representation, the techniques disclosed herein significantly reduce computational complexity. Furthermore, because 
the invention provides techniques that performs the processing in the compressed domain, there is no loss ol quality 
that would otherwise occur if the image were decompressed and then recompressed using a lossy algorithm, such as 
MPEG or JPEG. 

In the case of downsampling, a method has been developed wherein the compressed stream is processed in the 
20 compressed (DCT) domain without explicit decompression and spatial domain downsampling so that the resulting com- 
pressed stream corresponds to a scaled down image. The method disclosed herein ensures that the resulting com- 
pressed stream conforms to the standard syntax of 8 x 8 DCT matrices. For typical data sets, this approach of 
downsampling in the compressed domain results in computation savings around 80% compared with traditional spatial 
domain methods for downsampling wherein the data is first decompressed and then downsampled in the spatial 
25 domain and then re-compressed again for storage or transmission. 

In the case of inverse motion compensation, a method is disclosed to convert motion compensated compressed 
video into a sequence of DCT domain blocks corresponding to the spatial domain blocks in the current picture alone, 
i.e. each picture is represented as a sequence of DCT blocks that do not depend on data in other frames. By performing 
inverse motion compensation directly in the compressed domain, the reduction in computation complexity is around 
30 68% compared with traditional spatial domain methods for inverse motion compensation from compressed data. 

The functions of downsampling and inverse motion compensation can be used in a variety of applications such as 
multipoint video conferencing and video editing. 

BRIEF DESCRIPTION OF THE DRAWINGS 

35 

FIG. 1 is a flow chart of a prior art method of filtering image data in the spatial domain. 

FIG. 2 is a system diagram for a generic image/video editor according to the invention. 

40 Fig. 3 is a block schematic diagram of a system for performing image and video processing in the compressed 
domain according to the invention. 

Fig. 4 is a block schematic diagram of the downsampling unit of the present invention. 

45 Fig. 5 is a block diagram of matrix arithmetic hardware for performing DCT based downsampling by a factor of two 
according to a preferred embodiment of the invention. 

Fig. 6 is a block schematic diagram of an inverse compensation unit of the present invention. 

so Fig. 7 is a block diagram of matrix arithmetic hardware for performing DCT based inverse motion compensation 
according to the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 

55 One aspect of the invention reduces the time required to perform such image processing operations as downsam- 
pling by factors, such as 2, 3, and 4, as compared to traditional approaches. As used herein, the term "downsampling- 
by-a-factor k n means that if the input image resolution is M x N pixels, then after downsampling, the resulting spatial 
domain image resolution is {M/k) x (N/k). Because the downsampling transformation is linear, the overall effect in the 
DCT domain is linear as well, and hence the basic operation can be represented as multiplication by a fixed matrix. Fast 
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multiplication by this matrix is possible if it can be factorized into a product of sparse matrices whose entries are mostly 
0, 1,and-l. 

The invention provides a scheme that performs image processing in the compressed domain efficiently by taking 
advantage of the factorizations of the DCT and IDCT operation matrices that correspond to the fast 8-point DCT/IDCT 
5 (see W. B. Pennebaker, J. L. Mitchell, JPEG still Image Data Compression Standard , Van Nostrand Reinhold, pp. 50- 
63, 1993). 

The resulting schemes for downsampling save about 37% of the computations for a downsampling factor of two, 
39% for a downsampling factor of three, and 50% for a downsampling factor of four. As used herein, the term "compu- 
tation" corresponds to the basic arithmetic operations of a microprocessor, which are either shift, add, shift and add, 

w shift-one and add (SH1ADD), shift-two and add (SH2ADD), and shift-three and add (SH3ADD). These savings are 
worst-case estimates because no assumptions have been made regarding sparseness in the DCT domain. Typically, in 
a considerably large percentage of the DCT blocks all of the DCT coefficients are zero except for the upper left 4 x 4 
quadrant that corresponds to low frequencies in both vertical and horizontal directions. If this fact is taken into account, 
then computation reductions can reach about 80%. 

is Another advantage of the scheme herein disclosed is that it improves the precision ol the computations when com- 
pared to traditional approaches. It has been found that the degree of improvement in precision varies between 1 .5-3 dB. 
Thus, because the invention provides a technique that performs image processing in the compressed domain, there is 
no loss of image quality that would otherwise occur if the image were decompressed and then recompressed using a 
lossy algorithm, such as MPEG or JPEG, where each compression/decompression step intentionally and irrevocably 

20 discards a portion of the image to effect a high degree of compression. 

Another embodiment of the invention provides a scheme that includes a fast algorithm for undoing the motion com- 
pensation operation in the DCT domain (see S.F. Chang and D. G. Messerschmitt, Manipulation and Compositing of 
MC-DCT Compressed Video, IEEE journal on Selected Areas of Communications, Vol. 13, No. 1, pp. 1-11, 1994; and 
S. F. Chang, D. G. Messerschmitt, A New Approach to Decoding and Compositing Motion-Compensated DCT Based 

25 Images: Proc. ICASSP '93, Minneapolis, April 1993). The algorithm described herein receives as input DCT blocks of 
motion compensated compressed video, and provides DCT blocks of the corresponding spatial domain blocks of the 
current picture alone, without reference to past and future pictures. The operation of inverse motion compensation ena- 
bles video compositing in the DCT compressed domain and also enables other video processing functions such as 
downsampling, overlapping, translation, and filtering. 

30 One aspect of the invention further develops and improves on the scheme proposed by Chang and Messerschmitt, 
ibid, for inverse motion compensation. While in such scheme computations are saved only if the DCT blocks are suffi- 
ciently sparse, and only if a large fraction of the reference blocks are aligned to the boundaries between the original 
blocks at least in one direction, the scheme disclosed herein reduces the computational complexity by 47% compared 
to traditional approaches, even without any prior assumptions on sparseness or perfect alignment. If, in addition, DCT 

35 blocks are assumed sparse in the sense that only the top-left 4x4 subblocks are nonzero as is typically the case, then 
computational complexity is reduced by 68%. 

In compressed domain downsampling, a key aspect of the invention is the following: given four DCT domain 8x8 
sized blocks, the objective is to scale down the four blocks to yield a single 8x8 DCT block. The invention disclosed 
herein uses the frequency domain representation of the spatial domain operation and incorporates this representation 

40 in the DCT matrix factorization process. 

In compressed domain inverse motion compensation, a key aspect of the invention is that given the DCT of the dif- 
ference between blocks from two pictures, and the DCT of the block in one of the pictures, one needs to derive the DCT 
of the corresponding block in the second picture. Due to the fact that the blocks are not collocated in the two pictures, 
this computation is not trivial. 

45 For image and video data compressed with the JPEG, MPEG, or Px64 compression method, the invention provides 
a technique that can downsample the image or video data with fewer operations than is needed if the downsampling 
function were implemented in the spatial domain. For inverse motion compensation, the method described herein leads 
to significant computation savings over equivalent spatial domain methods. The methods disclosed herein for down- 
sampling and inverse motion compensation are well suited for efficient video editing systems and scalable multipoint 

so video conferencing systems. 

Image/video ^j\[qr 

Before turning to the details of image downsampling and inverse motion compensation according to the invention, 
55 a proposed system in which the one or both of these functions reside is shown in FIG. 2. The system includes a disk 
on which a preprocessed image 1 26 is stored in a f ile 1 24. The image 1 26 is compressed by a compression engine 1 28 
according to one of the DCT-based compression schemes (e.g., JPEG, MPEG, H.261 or H.263). The image, for exam- 
ple, could be produced by a photoprocessing shop. The compressed image is then stored in a file 124 to allow the 
owner of the photograph to edit the photograph using an image/video editor 130. The image/video editor 130 includes 
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image downsampling or inverse motion compensation functionality according to the invention that operates on the com- 
pressed image data stored in the file 124 to produce processed image data, which is also in the DOT domain, that is 
stored in another file 129 after processing. In an alternative embodiment, both downsampling and inverse motion com- 
pensation functionality reside in the image/video editor 130. The image/video editor 130 can include a plurality of pre- 
5 defined downsampling functions according to the invention and which allows the user to specify which one or more of 
these down sampling functions are desired. Although file 129 is shown on a separate disk from file 124, these disks 
need only be logically separate. In fact, both files 124 and 129 can be the stored on the same physical disk. 

Once the compressed image has been edited by the image/video editor 130, the photoprocessing shop can then 
decompress the edited image with decompression engine 132 to produce a edited image 1 34, which can then be proo- 
fs essed by the photoprocessing shop and sent to the owner. 

Fig. 3 is a block schematic diagram of a system 100 for performing image and video processing in the compressed 
domain according to the invention. The image processing system 100 includes a disk drive 1 1 1, a memory 15, and an 
image/video editor 130. The imageA/ideo editor 130 may be a special purpose computer or a general purpose computer 
programmed to provide the imageA/ideo editor functions of the present invention. 
15 The image/video editor 130 contains a Huffman decoder 12 for partially decoding compressed images. In the 
example of Fig. 3, a compressed image file 1 24 on disk drive 111 is input into the Huffman decoder 1 2 as a compressed 
bitstream 11. The compressed bitstream 11 is compressed in accordance with any known DCT-based compression 
scheme, such as MPEG, JPEG, H.261, or H.263. The Huffman decoder 12 writes to memory 15 a partially decoded 
image 128 which is subsequently provided to image/video editor 130. 
20 The image/video editor 130 contains a dequantizer 14 which is connected to the memory 15 to receive partially 
decoded images 128. The dequantizer 14 contains functionality to generate 8x8 blocks in the DCT domain. These 8 
x 8 DCT blocks are now amenable to DCT processing methods such as DCT domain downsampling, discussed below 
in conjunction with Figs. 3, 4, and 5, and DCT domain inverse motion compensation, discussed below in conjunction 
with Figs. 3, 6, and 7. 

25 The core of the image/video editor 1 00 is formed of the downsampling unit 1 0 and the inverse motion compensation 
unit 30. Various alternative embodiments are possible containing either the down sampling unit 10 or the inverse motion 
compensation unit 30 or both (as shown in Fig. 3). Both the down sampling unit 10 and the inverse motion compensa- 
tion unit 30 are connected to the dequantizer 14 to obtain dequantized image matrices. 

The output from the down sampler 10 is a down sampled image 23 and for the inverse motion compensation mod- 

30 ule 30, an inverse motion compensation fram 43. These respective outputs are further processed by a quantizer 26 and 
by a Huffman encoder 24. The Huffman encoder 24 reverses the partial decoding performed by the Huffman decoder 
12. Finally, the re-encoded image is transmitted back to its origin, e.g., as a processed compressed image file 129 on 
disk drive 111. 

35 Downsampling in the Compressed Domain 

Fig. 4 is a block schematic diagram of the downsampling unit 10 for performing downsampling in the compressed 
domain according to the invention. As discussed above, the dequantizer 14 obatains a partially decoded image 128 
from the memory 15 and extracts therefrom a series of 8 x 8 DCT blocks 22 which are then provided to the down sanv 
40 plingunitlO. 

As discussed in greater detail below, the 8 x 8 DCT blocks X h X 2 ,-.- from memory are processed with a determin- 
ing means 16 to calculate a value X, based on a downsampling means 17 downsampling factor, which can provide any 
desired downsampling factor, such as a downsample by two factor 18 in which four 8x8 DCT blocks are combined to 
yield a single 8x8 DCT block, downsample by three factor 19 in which nine 8x8 DCT blocks are combined to yield a 
45 single 8x8 DCT block, downsample by four factor 20 in which sixteen 8x8 DCT blocks are combined to yield a single 
8x8 DCT block, or other downsample factor 21. The downsampled image X23 is then output for further processing or 
display/reproduction, as desired. For example, as shown in Figure 3, image 23 may be provided to a Huffman encoder 
24 to produces a compressed bitstream 25 output which ultimately is stored as a compressed image file 129 on a disk 
drive 111. 

50 Uniquely, the scheme herein described combines at least four DCT blocks to accomplish downsampling in the com- 
pressed domain, while at the same time it provides an output DCT block that is the same size as each of the original 
blocks, i.e. four or more 8x8 DCT blocks are combined to produce a single 8x8 DCT block that is the average of the 
four or more blocks. 

The 8x8 2D-DCT transforms a block 

55 

in the spatial domain into a matrix of frequency components 
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k.l-0 



5 according to the following equation: 



w 

where 



c(0)=^ 



15 

and c(/c)=l for k> 0. The inverse transform is given by: 



20 

In a matrix form, let 

25 

and 

30 

Define the 8-point DCT matrix 

35 



where 



40 



(3) 



Then, 



45 



(4) 



where the superscript t denotes matrix transposition. Similarly, let the superscript -t denote transposition of the inverse. 
so Then, 

* = S' T XS'' = S*XS (5) 

where the second equality follows from the unitarity of S. 
55 Now, suppose there are four adjacent 8x8 spatial domain data blocks x-f , x 2 , x 3 , and x 4 that together form a 16 x 
16 square, where x 1 corresponds to northwest, x 2 to northeast, x 3 to southwest, and x 4 to southeast. Downsampling 
(decimation) by a factor of two in each dimension means that every nonoverlapping group of four pixels forming a small 
2x2 block is replaced by one pixel whose intensity is the average of the four original pixels. As a result, the original 
blocks x 1 x 4 are replaced by a single 8x8 output block x corresponding to the downsampling of x 1 x 4 . Our task 
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is to calculate efficiently X, the DCT of x, directly from the given OCT'S of the original blocks X 7 , X 2 , X 3t and X 4 . 

The problems of downsampling down by a factor of 3 and 4 are defined similarly, where the number of input blocks 
is 9 and 16, respectively, and the blocks x 7 , x^ f ... are indexed in a raster scan order. Similarly, as in the case of a factor 
of 2, every nonoverlapping group of 3 x 3 pixels for the case of downsampling by 3, and 4 x 4 pixels for the case of down- 
sampling by 4, are replaced by their average, and the output is one DCT block that is denoted always by X. 

For the sake of simplicity, consider first the one dimensional case and downsampling by 2. The two dimensional 
case is a repeated application for every row and then for every column of each block. In this case, there are given two 
8-dimensional vectors X 1 and X 2 of DCT coefficients corresponding to adjacent time domain vectors of length 8, 
x j = S ' 1 X 1 and x 2 = S' 1 X 2 , and it is necessary to calculate X, the DCT of the 8-dimensional vector x, whose each 
component is the average of the two appropriate adjacent components in x 1 or x 2 . 

It is convenient to describe the downsampling operation in a matrix form as follows. 



(6) 



where 



20 



25 



30 



35 



1 1 0 0 0 0 0 0 
0 0 1 1 0 0 0 0 
0 0 0 0 1 1 0 0 
0 0 0 0 0 0 1 1 
00000000 
0000 0000 
00000000 
(0 0 0 0 0 0 0 oj 



and 



40 



45 



50 



55 



e 2 = 



00000 000 
00000000 
00000000 
00000000 

1 1 0 0 0 0 0 0 
0 0 1 1 0 0 0 0 
0 0 0 0 1 1 0 0 
0 0 0 0 0 0 1 I) 



Therefore, 
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(7) 



10 



Consider next efficient factorizations of the matrices U 1 - SQ-fS' 1 and U 2 = SQ 2 S' 1 . To this end, a factorization of 
S is used that corresponds to the fastest existing algorithm for 8-point DCT. See W. B. Pennebaker, J. L Mitchell, JPEG 
Still Image Data Compression Standard . Van Nostrand Reinhold, pp. 50-63, 1993. According to this factorization, S is 
represented as follows: 



S = DPB l B 2 MA i A 7 A i 



(8) 



15 



20 



25 



30 



35 



where D is a diagonal matrix given by: 

D = diag {0.3536, 0.2549,0.2706, 0.3007, 0.3536, 0.4500, 0.6533, 1.2814} 
P is a permutation matrix given by: 

1 0 O 0 0 0 0 0* 

0 0 0 0 0 1 0 0 

0 0 1 0 0 0 0 0 

0 0 0 0 0 0 0 1 

0 1 0 0 0 0 0 0 

0 0 0 0 1 0 0 0 

0 0 0 1 0 0 0 0 

k 0 0 0 0 0 0 1 Oj 



(9a) 



(9b) 



40 



and the remaining matrices are defined as follows: 



45 



50 



55 
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70 



15 



1 0 0 0 0 0 0 0 

0 1 0 0 0 0 0 0 

0 0 1 0 0 0 0 0 

0 0 0 1 0 0 0 0 

0 0 0 0 1 0 0 1 

0 0 0 0 0 1 1 0 

0 0 0 0 0 1 -1 0 

\0 0 0 0 -1 0 0 1, 



(9c) 
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1 0 0 0 0 0 0 0 

0 1 0 0 0 0 0 0 

00 1 10000 

0 0 -1 1 0 0 0 0 

0 0 0 0 1 0 0 0 

0 0 0 0 0 1 0 1 

0 0 0 0 0 0 1 0 

0 0 0 0 0 -1 0 1, 
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(9h) 



50 

Thus, for i=1,2 we have: 

E/, = SQ^ 1 = DPB 1 5^^3e / 43- , >4;U l ' , A/- l B 2 * , B l ■ ,p ■ ,Z) " , (10) 

55 

The downsampling algorithm disclosed herein is based on the observation that the products 

F r -msAQt*? A ;%' l M' ] i-u (11) 
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are fairly sparse matrices, and most of the corresponding elements are the same, sometimes with a different sign. This 
means that their sum F \=F '^F ' 2 and their difference F-=F r F 2 are even sparser. These matrices are given as fol- 
lows: 
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0000 0 000 

0000 0 000 

0 0 0 0 0.7653 0 1.8477 0 

0 0 0 0 -0.7653 0 1.8477 0 

0.5412 0 0 0 0 0 0 0 

0.7071 0 -1 0 0 0 0 0 

13066 0 0 0 0 0 0 0 

^ 0.500 0 0.7071 0 0 0 0 0, 



45 



50 



Finally, using Equations 10 and 1 1 , Equation 7 can be written as: 

Idpb^f^b^p -'z> -'(r, *x 2 )*fb;%- 1 p -'z> -x 7 j 



(12) 



Next, count the number of basic arithmetic operations on a microprocessor that are needed to implement the right- 
55 most side of Equation 12 and compare it to the spatial domain approach. As explained above, here the term operation 
corresponds to the elementary arithmetic computation of a typical microprocessor which is either shift, add, or shift and 
add, shift one and add (SH1 ADD), shift two and add (SH2ADD), and shift three and add (SH3ADD). For example, the 
computation 2=1.375x+1.125y is implemented as follows: 

First, compute u = x+0.5x (SH1ADD), then v=x+0.25u (SH2ADD), afterwards w=v+y (ADD), and finally, 



13 



EP 0 794 674 A2 



2= w+0.125y (SH3ADD). Thus, overall four basic operations are needed. 

When counting the operations, we use the fact that multiplications by D and D' 1 can be ignored because these can 
be absorbed in the dequantizer 14 and the quantizer 26. respectively. The matrices P and F r1 cause only changes in 
the order of the components, so they can be ignored as well. 
5 Thus, the following operations are left, where the number of additions/subtractions and the number of nontrivial 
multiplications are shown in parentheses: 

Creating X^+X 2 and X V X 2 ' 16 operations (16 additions). 
Two multiplications by : 8 operations (8 additions). 
10 Two multiplications by -, 1 : 8 operations (8 additions). 

Multiplication by F + : 23 operations (5 multiplications + 5 additions). Multiplication by F_: 28 operations (6 multiplica- 
tions + 4 additions). 

Adding the products: 8 operations (8 additions). 
Multiplication by B 2 : 4 operations (4 additions). 
is Multiplication by B 1 : 4 operations (4 additions). 

Total: 115 operations (1 1 multiplications + 57 additions). 

In the spatial domain approach, on the other hand, the following operations are required: 

20 Two IDCT's: 1 14 operations (1 0 multiplications + 60 additions). 
Downsampiing in time domain: 8 operations (8 additions). 
OCT: 42 operations (5 multiplications + 30 additions). 
Total: 161 operations (15 multiplications + 98 additions). 

25 It turns out, as can be seen, that the herein disclosed scheme saves about 30% of the operations in the one-dimen- 
sional case. It is possible to obtain even greater reductions in complexity in the two-dimensional case. 

As a byproduct of the herein disclosed scheme, it should be noted that arithmetic precision is gained. Because in 
the direct approach, each one of the matrices is multiplied on right hand side of Equation 1 1 one at a time, then roundoff 
errors, associated with finite word length representations of the elements of these matrices, accumulate in each step. 

30 On the other hand, in the herein disclosed scheme, it is possible to precompute Fj once and for all to any desired degree 
of precision, and then round off each element of these matrices to the allowed precision. The latter has better precision, 
as is discussed in greater detail below. 

Consider now the two dimensional case. A 2-D DCT can be performed as a row-wise DCT operation. Row-wise 
DCT operation implies taking the 1 -D DCT of each row of the spatial domain image block. Column-wise DCTopeartion 

35 implies taking the 1-D DCT of each column of the block resulting from the row-wise DCT opeartion. Therefore, the 
invention herein is readily applied to the two dimensional case as well. The following discussion describes in detail the 
computation schemes for downsampiing by a factor of 2. 3, and 4. 

Downsampiing by 2 

40 

The two-dimensional extension of Equation 6 is: 

x^XiQi+QfrQi+Q^Ql+Q^Ql) (l 3 ) 

45 

Note that x is the downscaled-by-two 8x8 block for the region covered by the 8x8 blocks x 7 , x 2 , x 3 , and x 4 . 
The corresponding DCT domain extension of Equation 12 is 

x=^c/^^^^^a; + c^,^) • (M) 



55 Therefore, the present invention computes X efficiently by directly manipulating the data in the DCT domain. The 
spatial domain approach requires explicitly performing the IDCT to compute x h x 2 , x 3 , and x 4 and then computing x 
using Equation 13 and subsequently taking the DCT of x to get X. 

Again, it is desirable to express the right-hand side of Equation 14 in terms of 
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u^u^u^dpb^jb; x b; x p- x d 



and 



ro To this end, define 



X M t — X^Xj^X^X^ i 
X+_ = Xj+X 2 m Xj-X 4 f 



(15) 



(16) 



20 



= X I -X 2 +X 3 'X i , 



(17) 



25 



and 



X__+ =s Xj-Xj-Xj+Xj 



(18) 



Note that to create all of the foregoing linear combinations, only 8 (and not 12) additions/subtractions ar required 
30 per frequency component: 

First compute X^±X 2 and X 3 ±X 4 and then (X 1 4-X 2 ) ± ( X 3 +X 4 ) and (X r X 2 )± 
(X 3 -X 4 ). Now, Equation 14 can be rewritten as: 



35 



40 



45 



f.b^b; x p ' x d ^x^+f.b^b^p ~ x d ~%.!p -'p -% 'b^f's 

F& x B; x P -'Z> ^X^+F^B^P •'£> ■ , A'„.^> "V* "BfBfFi 
■BflP'D'. 



(19a) 



50 



55 



If the number of operations associated with the implementation of the right-most side of Equation 19a are counted, 
the total is 2824 operations. The spatial-domain approach, on the other hand, requires 4512 operations. This means 
that 37.4 % of the operations are saved using the inventive scheme disclosed herein. 

Additional savings in computations can be made by taking advantage of the fact that in typical images most of the 
DCT blocks Xj have only a few nonzero coefficients, which are normally the low frequency coefficients. 

One approach to implementing this additional computational savings is to use a mechanism that operates in two 
steps. In the first step. DCT blocks are classified as being lowpass or nori-lowpass, where the former are defined as 
blocks where only the upper left 4 x 4 subblock is nonzero. The second step uses either the computation scheme 
described above for non-lowpass blocks, or a faster scheme that uses the lowpass assumption for the precomputation 
of the above matrix multiplications; It turns out that if Xi % ... t X 4 are all lowpass blocks, then the reduction in computations 
is about 80 %. This same approach is applicable to downsamplings by 3 and by 4 described below. 
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Fig. 5 is a schematic illustrating the downsampling by factor of two by downsampling means 17 of Fig. 4. In effect 
the circuitry of Fig. 5 is a practical application of Equation 19a in the art of digital image processing. 

The matrices X h X 2 , X 3 , and X 4 are obtained from determining means 16 and input to a matrix addition network 
401 to produce X^ ++1 X. + _, X + ._, and X„ + . The matrix addition network 401 implements equations (15), (16), (17). and 
5 (18). X +++ is multiplied to. and F + by multipliers 403, 405. 407, and 409, respectively. Similarly, X. + _, X+_„, and X.. + are 
each multiplied to P'\ B{\ and B 2 ' } by matrix multipliers 411 through 427, respectively. The result from matrix multi- 
plier 415 is further multiplied to F + by matrix multiplier 429. and the results from multipliers 421 and 427 are multiplied 
to F.by matrix multipliers 431 and 433, respectively. 

These various products produced by matrix addition network 401 and matrix multipliers 403 through 
w 433 are added 'by matrix adders 435 and 437. The terms added by matrix adder 435 are the 
quantities X^P l B,'*B 2 l F + and X+..P 'b/'b/'f.. The terms added by matrix adder 437 are the quantities 
'#! ! B 2 V + and X.^r'B" l B 2 V. 
Each of the sums produced by matrix adders 435 and 437 are multiplied by the matrices P'*, B{*, B 2 l by matrix 
multipliers 435-449. The product produced by matrix multiplier 443 is then multiplied to matrix F + by matrix multiplier 
is 451 and the product produced by matrix multiplier 449 is then multiplied to matrix F + by matrix multiplier 453. The result- 
ing products from matrix multipliers 449 and 451 is then added by matrix adder 455. The resulting sum is: 

(f.B^B^P -%„+F B 2 l B x ' l p-% )p -'B^B^'Fl* 

20 ; (1%) 

fJB 2 B; x P^X^ +FB; l B; l P l X Jf-'B^B^Fl 



25 The dequantizer 14 and the quantizer 26 include the computations involving the matrixes D, D~ 1 , D x , and D M of 
equation 19. Therefore, matrix multipliers are not needed, in the downsampling means 17, for multiplying those matri- 
ces. 

The sum from equation (19b) is then multiplied to the remaining matrices of equation (19a). namely, matrices B 2 , 
B 2 , B 1t B/, P, and P'by matrix multipliers 457 through 467 respectively. Finally, the resulting product is multiplied by 
30 the downsampling factor 1 /1 6 by multiplier 469. 

Downsampling bv 4 

One approach for downsampling by 4, is to downsample twice by 2. However, it turns out that by using the same 
35 methods, a more efficient scheme for downsampling directly by 4 can be developed. Downsampling by 4, involves the 
following downsampling matrices: 
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r l 1 1 1 0 0 0 O 1 
0 0 0 0 1 1 1 1 
00000000 
00000000 

oooooooo 

00000000 

oooooooo 
oooooooo, 



25 







oooooooo' 


30 




oooooooo 

1 1 1 1 0 0 0 0 


35 




0 0 0 0 1 1 1 1 

oooooooo 
oooooooo 
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f 0 0 0 0 0 0 0 0^ 
00000000 
00000000 
00000000 
1 1 1 1 0 0 0 0 

0 0 0 0 1 1 1 1 

00000000 
0000000 0; 
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06 = 



0 0 0 0 0 0 o o 1 
00000000 
00000000 
00000000 
00000000 
00000000 

1 1 1 1 0 0 0 0 
0 0 0 0 1 1 1 \) 
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x = 



16 



15 



20 



25 



30 



35 



Next, define: 



and 



H^MA^Q^A^A^M' 1 



= Ht~H 2 +H 3 ~H 4 



40 



45 



Also define the following linear combinations on the input data: 

X f =X { +X iM *X t ^ + A" M2 
Xj -X i -X.^-X^ +X iA2 
X t =X t -X t M *X t ^ -X f #l j 
X x =X i +X t ^ -X { ^ -x t „ l2 

so where i=1, 2.3. 4, 

x~*xr+xf*-x~'xr 



55 



(20) 



(21) 
(22) 

(23) 
(24) 



(25) 
(26) 
(27) 
(28) 



(29) 
(30) 
(31) 
(32) 

(33) 
(34) 
(35) 
(36) 
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and similar definitions for the superscripts +--, -+-, and ~+. To create all these combinations 64 additions/subtractions 
are required per frequency component. Now, similarly as in Equation 19a: 



10 



15 



SO 



25 



30 



X=—DPB.B; 
256 1 2 



x:::d -<p • t B l %- , H , _._ *x~d -p -'b^h'... 



"D-p-'BfBjHl..* 

-Id -<ul 



H^B-'P-'D-'- 
'x:~D -'P «B?B?H'„ 

x:zd- i p- i b?b?h'_._+x:: 
h_.b 2 - 1 b; ] p-*d-*- 



;d-'p-'b;'b?hL„ } 



XZD- t P"B^H'^*X < 



d-'p-%~'b;'hL_+ 
xZd' , p- , b;'b?h'...+x:zd- , p-%'b; , hI.. j 



h_b; x b?p -'£»-'• 

xzd -'p -'b^h'^x:: 
x:::d -'p -'b^h'.^x:: 



:d- , p'%'b; , h[..* 
;d- , p-'b;'BiH'__. 



■BftP'D' 



(37) 



35 The number of operations in implementing this formula is 8224 operations as compared to 1 6224 operations in the 
spatial-domain approach. 

Equation 37 may be implemented using hardware matrix multipliers and adders in a similar fashion as described 
above in conjunction with Figure 4. 

40 Downsampling by 3 

Downsampling by a factor of 3 is more problematic and less elegant than the factors of 2 and 4 because some of 
the 3 x3 blocks to be averaged are not entirely within one 8 x 8 DCT block. In this case, there are three types of down- 
sampling matrices: 

45 
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r l 1 I 0 0 o o o 1 
OOOllIOO 
0 0 0 0 0 0 1 1 
00000000 

10 07 = 

00000000 
00000000 

is 00000000 

^0 0 0 0 0 0 0 0, 
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so 



55 



BNSLHJCtU-<bP— 079465 
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'o 0 0 0 0 0 0 o 1 
00000000 
1 0 0 0 0 0 0 0 
0 1 I 1 0 0 0 0 
0 0 0 0 1 1 1 0 
0 0 0 0 0 0 0 1 
00000000 

,0 0 0 0 0 0 0 0; 
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35 



40 



45 



and 
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00000000 
00000000 
00000000 
00000000 
00000000 
1 1 0 0 0 0 0 0 
0 0 1 1 1 0 0 0 
{0 0 0 0 0 1 1 I) 
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20 



Similarly as in the cases of a factor of 2 and 4, define: 



T 2 =MA jA^Q^A^A ,"'A/ 



25 The computation scheme is based on the fact that the matrices: 

T = T,+T 2 +T 3 , 



30 



35 



40 



r + = t,+Tj, 



T.= (T r T,)/2 



are relatively sparse, and on the identity: 



T t X+T 2 Y+T£*TY+T^™-Yj +7.(Y-Z). 



(38) 



(39) 
(40) 
(41) 



(42) 
(43) 
(44) 



(45) 



If Equation 38 is transformed to the DCT domain and we express the fixed matrices in terms of T, 7" +> and T. % using 
45 Equation 45, after some algebraic manipulations, the following computation formula is obtained: 



so 
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X=DPB { B 7 



\p\D '*P -'Bfa'T'+Xfi "P -B^'BjTl+XyD "'P ''B^T^ 

t.b; x b; 1 p-*d h 
tb; 1 b; ] p-*d~ 1 

^CfD ~'P ''B^B^'P +Xfi ~*P ''B^B^Tl +Xfl ~'P ''B^B^'T^ 



•BftP'D' (46) 



where, 
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45 



x t =x 5 
Xi~X A -X 6 

x>&*x t )-x s 



-\x^ x \ (x 3 +x 9 



-x 6 



X 1 =X 2 ~X t 

x,^-Xi*x 3 -x 9 )-x; 

Xg ~X^ ~X-j ~Xy *Xp 



(47) 
(48) 

(49) 
(50) 

(51) 

(52) 

(53) 
(54) 

(55) 



so The transformation from {X 1t ...X$} to {X\ X^} can be performed in 18 operations per frequency component. 

The total number of operations associated with the implementation of Equation 46 is 5728 operations. On the other 
hand, the number of operations associated with the spatial domain approach is 9392 operations, i.e. the reduction in the 
number of computations is about 39%. 

In one embodiment the downsampling by 3 method described above is implemented using matrix multipliers and 
55 matrix adders in a similar fashion as described above in conjunction with Figure 4 for the case of downsampling by 2. 

The herein disclosed computation scheme provides better arithmetic accuracy than the standard approach. To 
demonstrate this fact, the inventors have tested both schemes for the case of downsampling by 2, where each element 
in each of the above defined fixed matrices is represented by 8 bits. 

In the first experiment, the elements of x 1t ...,x 4 , have been chosen as statistically independent random integers 
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uniformly distributed in the set {0,1 255}. First compute x, and then X, directly from x h ...,x 4 for reference. Then com- 
pute the DCTs X 1 X 4 where all DCT coefficients are quantized and then dequantized according to a given quantiza- 
tion matrix A. From X 1 X 4 , compute X using both the standard approach and the herein disclosed scheme, and 

compare the results to the reference version, where the precision in each approach is measured in terms of the sum of 

5 squares of errors (MSE) in the DCT domain, and hence also in the spatial domain. For the case where A is an all-one 
matrix, the MSE of the herein disclosed scheme is about 3dB better than that of the standard approach. For the case 
where A is the recommended quantization matrix of JPEG for luminance (see W. B. Pennebaker, J. L. Mitchell, JPEG 
still image Data Compression Standard. Van Nostrand Reinhold, 1993), the herein disclosed scheme outperforms the 
standard approach by 1 .2dB. These results are reasonable because when the step sizes of the quantizer increase, 

to quantization errors associated with the DCT coefficients tend to dominate roundoff errors associated with inaccurate 
computations. 

The second experiment is similar, but the test data that is based on a real image rather than random data For the 
case whereA is an all-one matrix, the standard approach yields SNR of 46.08dB, while the herein disclosed scheme 
yields 49.24dB, which is again a 3dB improvement. For the case where A is the JPEG default quantizer, the figures are 
is 36.63dB and 36.84dB, respectively. Here, the degree of improvement is less than in the case of random data because 
most of the DCT coefficients are rounded to zero in both techniques. 

Inverse Motion Compensation 

20 Fig. 6 is a block schematic diagram of an alternative view of the inverse motion compensation unit 1 0 for performing 
inverse motion compensation in the compressed domain of image/video editor 130 according to the invention. As 
shown in Fig. 3, the image/video editor 130 may contain an down sampling unit 10. As discussed above in conjunction 
with Fig. 3, a compressed bit stream 128 from disk 124, is partially decoded by a Huffman decoder 12. The bitstr am 
is compressed in accordance with any known DCT-based video compression scheme, such as MPEG or H.261. The 

25 partially decompressed bitstream 1 28 is dequantized by dequantizer 1 4. 

The dequantizer 14 is connected to the inverse motion compensation unit 30. A picture fetching module 41 of the 
inverse motion compensation unit 30 extracts from the dequantized and partially decompressed bitstream data X 1 , X 2 , 
.... In the form of a DCT 8X8 block which corresponds to the present picture (at time 7), and to a motion vector h, w, 
which corresponds to the difference between an anchor picture (at time T-1) and the present picture (7). The DCT block 

30 and motion vector information are then provided to a processing unit 40, which in effect executes the calculation of 
Equation 59 (discussed below). 

As discussed in greater detail below, the 8 x 8 DCT block is processed to locate a desired region within the anchor 
picture T-1, such that up to four 8x8 DCT blocks are combined to yield a single 8x8 DCT block. A module 36 in the 
processor fetches h, w from the memory 15, while another module 41 fetches the four 8X8 DCT blocks. X h X 2 , X 3% 

35 X 4 , which comprise the picture (T-1). Another module 37 uses h and w to fetch the required J, K matrices and thereby 
precalculate a set of fixed matrices, while another module 38 fetches an 8 X 8 DCT block that corresponds to the 
present picture (7) from the memory. Once the proper region of the anchor picture (T-1 ) is located, the four 8X8 DCT 
blocks that comprise the information necessary to establish the anchor picture (T-1) are converted to a single 8x8 DCT 
block by a processing unit 40 for carrying out the inverse motion compensation calculation (e.g., Equation 59 below), 

40 and the block thus determined is combined with an 8 x 8 DCT block that represents the difference between the anchor 
picture 7-7 and the present picture T by an adder 39 in the processor to produce an independent image. The inverse 
motion compensated DCT block 43 is then output to the quantizer 26 and subsequently to the Huffman encoder 24 to 
produce an output 25 that may be stored on the disk drive 1 1 1 , as shown in Fig. 3. 

Motion compensation of compressed video means predicting each 8x8 spatial domain block x of the current pic- 

45 ture by a corresponding reference block x from a previous picture and encoding the resulting prediction error block 
e = x - x by using the DCT (see Equations 1 to 5 above). In some of the pictures (e.g. B-pictures), blocks are estimated 
from both past and future reference blocks. See, Coding of Moving and Associated Audio. Committee Draft of Standard 
IS011172, ISO/MPEG 90/176, December 1990; Video Codec for Audio Visual Services atpx64 Kbits/s, CCITT Rec- 
ommendation H.261, 1990; and D. le Gall, MPEG: A Video Compression Standard for Multimedia Applications, Com- 

50 mun. of the ACM, Vol. 34, No. 4, pp. 47-58, April 1 991 . For the sake of simplicity, assume that only the past is used (e.g. 
P-pictures), and the extension is straightforward. 

The best matching reference block x may not be aligned to the original 8x8 blocks of the reference picture. In 
general, the reference block may intersect with four neighboring spatial domain blocks, henceforth denoted x h x 2 , x 3 , 
and x 4 , that together form a 1 6 x 1 6 square, where x 1 corresponds to northwest, x 2 to northeast, x 3 to southwest, and 

55 x 4 to southeast. 

The goal is to compute the DCT X of the current block x = x + e from the given DCT E of the prediction error e, 
and the DCTs X 1 X 4 of x 1t ..„ x 4 , respectively. Because 
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X=X+Z X 



5 is the DCT of x , the main problem that remains is that of calculating X directly from X 1 X 4 . 

Let the intersection of the reference block x with x 1 form an h x w rectangle (i.e. having h rows and w columns), 
where 1< h< 8 and 1< w < 8. This means that the intersections of x with x 2 ,x 3: and x 4 are rectangles of sizes h x (8- 
w), (8-h) x w, and (8-h) x {8-w), respectively. 

Following Chang and Messerschmitt ibid., it is readily seen that x can be expressed as a superposition of appro- 
ve priate windowed and shifted versions of x 1 x 4 , i.e. 



(56) 

15 

where qj, /=1 4, y=1,2, are sparse 8x8 matrices of zeroes and ones that perform window and shift operations 

accordingly. The basic idea behind the work of Chang and Messerschmitt ibid, is to use the distributive property or 
matrix multiplication with respect to the DCT. Specifically, because S'S=I , Equation 56 may be rewritten as: 

20 4 

x=Zy<Sx t S'Sc a . (57) 
Next, by premultiplying both sides of Equation 57 by S, and postmultiplying by S', one obtains: 

25 

4 



X'ZCnXfi* (58) 

i-1 



where C /y is the DCT of dj. Chang and Messerscmitt ibid, proposed to precompute the fixed matrices C i} for every pos- 
sible combination of w and h, and to compute X directly in the DCT domain using Equation 58. Although most of the 
matrices Cy are not sparse, computations can still be saved on the basis of typical sparseness of {Xj, and due to the 
fact the reference block might be aligned in one direction, i.e. either w=8 or h=8, which means that the right-hand side 
of Equation 58 contains two terms only; or in both directions w=h=8 , in which case 

x=x } 



and hence no computations at all are needed. 

The computation of X is performed here even more efficiently by using two main facts. First, some of the matrices 
Cjj are equal to each other for every given w and h. Specifically, 



45 



0 L 



(0 0) 



50 



55 



0 0| 

L 0 



where l h and l w are identity matrices of dimension h x h and w x w, respectively. Similarly, 
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and 



The second observation that helps in saving computations is that rather than fully precomputing C,y, it is more effi- 
cient to leave these matrices factorized into relatively sparse matrices. In particular, the scheme herein also uses the 
above described factorization of S (see Equations 8 to 9). 

The best way to use the two observations mentioned above is the following: 
15 First, we precompute the fixed matrices: 



i=h2 8 



20 and 



25 



KfiLfylA^jA^ /=U 8. 



These matrices are very structured and therefore, premultiplication by K, or J, can be implemented very efficiently. 
Next, compute X by using the expression: 



30 



JJBlBlP'lfy l DPB ] BJl+X^PB l B£iJ}+ 
^^[P^fiPB^XflPB^^ 



(59) 



35 



which can easily be obtained from Equation 57. or by its dual form: 



40 



x=s 



(60) 



45 



SO 



55 



depending on which one of these expressions requires less computations for the given w and h. 

Fig. 7 is a block diagram of matrix arithmetic hardware for performing OCT based inverse motion compensation 
according to the present invention. As discussed above, a module 41 fetches the matrixes X h X 2 , X 3 , and X 4 . Another 
module 36 fetches the motion vector h, tvfrom the memory 15. The motion vectors are used to obtain the appropriate 
matrices J f and K, from the memory 1 5. 

The matrices X 1 is each multiplied to the matrices P, B 1 , B 2 , and the matrix f w by matrix multipliers 703, 705, 707, 
709; X 2 is each multiplied to the matrices P, S 7 , B 2 , and the matrix j\ v by matrix multipliers 703', 705', 707, 709'; X 3 
is each multiplied to the matrices P, B v B 2 , and the matrix k' 6 _ w by matrix multipliers 703", 705", 707", 709"; and X 4 
is each multiplied to the matrices P, B it B 2 , and the matrix K' Q _ W by matrix multipliers 703'". 705'". 707"', 709*" The 
result of these matrix multiplications quantites correspond to 
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XJPB^^K^ 

5 XfBfijl and 

XJ>B X BJC^ W 

7 o X 1 PB 1 B 2 and X 2 PB 1 B 2 are then added by adder 71 1 , and X 3 PB 1 B 2 and X 3 PBfi 2 are then added by adder 71 V. The 
results from 711 and 71V are then multiplied by the matrices P t B 1 t B 2 t by matrix multipliers 713, 715, 717, and 713', 
715', 71 T, respectively. The output from matrix multiplier 71 7 is then multiplied to matrix J h by matrix multiplier 719, and 
the output from matrix multipler 71 T is multiplied to K 8rh by matrix multiplier 71 9'. The output matrixes from matrix mul- 
tipliers 719 and 719' are then added by matrix adder 721 to produce the output DCT X. X Is finally added, by matrix 

15 adder 723, to E to produce the desired result X. 

In one embodiment the processing unit 40 is realized using hardware matrix multiplers and adders. In an alternative 
embodiment the matrix manipulations illustrated in Fig. 6 are implemented in software and carried out on a general pur- 
pose computer. Other alternatives include programmable logic devices and firmware. 

The following demonstrates how to implement fast multiplication by J, and Kg. As an example, consider J 6 . The 

20 other matrices are handled in a similar fashion. 
The matrix J 6 is the following: 

-1 -a 0 b a c 0 > 
1 -a -! b 0 c 0 
1 -a -1 -b 0 -c 0 
-1 -a 0 -b -a -c 0 
-1 a 0 c -a -b 0 
1 a 1 c 0 -b -1 
0 0 0 0 0 0 0 
0 0 0 0 0 0 0, 



25 



30 



35 



where a=0.7071 , t>=0.9239, and c=0.3827. To compute u=J^v,where u-(uj,..., u 8 ) x and v=(v, v 8 )*, calculate 

according to the following steps: 

45 
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y,=v t +v 2 


(61) 


y 2 =v r v 2 


(62) 


y 3 =av 3 


(63) 




(64) 


y 5 =yrys 


(65) 


yryrv< 


(66) 


yr=y 3 -y< 


(67) 


y 8 =yj+y« 


(68) 


y 9 =(b+c) (Vj+v 7 , 


(69) 


yw=cv 5 


(70) 


y 12 =bv 7 


(71) 


yi 2 =y9-yio-yu 


(72) 


yi 3 =y 10 -yu 


(73) 


U!=y 2 -yi+yi2 


(74) 


u 2 =y6+yi2 


(75) 




(76) 


v^yz-ye-yiz 


(77) 


Uj=y 2 +y7+yu 


(78) 


u<ryi+y 3 +v4+yj 3 -v 9 


(79) 


u 7 =0 


(80) 


u s =0 


(81) 



40 

This implementation requires 5 multiplications and 22 additions. 

By developing similar implementation schemes of matrix multiplication for all matrices Jf Jg, it can be seen that 

the numbers {A//} of operations required to multiply by [Jj) t 1 <, i < 8, are given by A/j=18, A/^24, A/ 3 =38, A/ 4 =39, A/ 5 =40, 
A/ 6 =43, N 7 =44, and A/ 8 =46. Because the matrix K t has a structure similar to that of J t for every 1 < / < 8, multiplication 
45 by K ( costs also A/,- operations. 

Again, when counting the operations in the implementation of Equations 59 or 60, multiplications by D and D' 1 can 
be ignored because these can be absorbed in the MPEG quantizer and dequantizer, respectively. The matrices P and 
P~ 1 cause only changes in the order of the components so they can be ignored as well. 

Thus, for a general position reference block, i.e. 1 < w < 7, 1 <, h < 7, the following is obtained: 

50 

1. Six multiplications by B 1 or : 6 x 32= 192 operations. 

2. Six multiplications by B 2 or B 2 : 6 x 32= 192 operations. 

3. Two multiplications by J w and K 8 . w , and one by J h and K 8 . h , or vice versa: 8x (N h + N 8 _ h . 
+ N w + N 8 . w + min{A/ /J + N 8 . h , N w + N 8 . w ) operations. 

55 4. One 2D-DCT: 42 x 1 6= 672 operations. 

Total: 1056 + 8 x {N h + N 8 _ h + N w + N 8 . w + m\r\[N h +N 8 . h , N w +N 8 . w }) operations. 

Note that additions of the products in Equations 59 and 60 are not counted because the different summands are 
nonzero on disjoint subsets of indices of matrix elements. When the reference block is aligned in the vertical direction 
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only. i.e. h = 8 and 1 < w < 7, then K g . h = K 0 = L 0 (MA ,/M 3 )'=0 . and therefore Equations 59 and 60 contain two 
terms only. Furthermore, because J h =J 8 =U 8 (MA ,A 2 A 3 y=(MA ,A 2 A 3 )', Equation 66 degenerates to: 

s X^DPBfiJsXpPBfJlJp 1 (82) 

which requires the following steps: 

1. Two multiplications by B 1 : 2 x 32=64 operations 
10 2. Two multiplications by B 2 : 2 x 32=64 operations. 

3. One multiplication by J w and one by K 8 . w \ 8(N W +N 8 _ W ) operations. 

4. One multiplication by S*: 8 x 42=336 operations. 
Total: 464+8(A/ w +/V 5 _ kV ) operations. 

is Similarly, for the horizontally aligned case, where iv=8 and 1 < h < 7, the number of computations is 
464+8(A/ rt + N 8 _ h ) . As mentioned earlier, when w=/?=8 no computations are required at all because 

20 

and hence is already given. 

By using the above expressions, it can be seen that the number of computations for the worst case values of h and 
w is 2928 operations, and the average number, assuming a uniform distribution on the pairs {(w f h) : 1 < w < 8, 1 <$ h 
<L 8}, is 2300.5. On the other hand, the brute-force approach of performing IDCT to X,,..., X 4 , cutting the appropriate 
25 reference block in the spatial domain, and transforming it back, requires a total of 4320 operations. This means that the 
reduction in computational complexity, in comparison to the brute-force method, is 32% for the worst case and 46.8% 
for the average. 

So far it has not been assumed that the input DCT matrices are sparse. Typically, a considerable percentage of the 
DCT blocks have only a few nonzero elements, normally, those corresponding to low spatial frequencies in both direc- 
30 tions. For simplicity, a DCT block is considered sparse if only the top left 4 x 4 quadrant, corresponding to low frequen- 
cies, is nonzero. 

The implementation of multiplication by J, and K h 1 < / <; 8, when X p ..., X 4 are assumed sparse in the above sense 
reduces the number of computations to 

for 1 <w<7 and 1 < h < 7, 

40 

336*4-(y>A/;_J 

for h=8 and 1 < w< 7, 

for w=8 and 1 < h < 7, and 
zero when w=h=8 , where: 

50 

/VIS, #2=20, #3=26, N 4 =3\ ^=36^=40, tf>4i, and A^=42. 



This means that there are 1728 computations in the worst case and 1397.2 computations on the average, corre- 
sponding to reductions of 60 % and 68 %. respectively, compared to the brute force approach. 

For comparison with earlier results, Chang and Messerschmitt ibid, have shown computation savings only if the 
DCT matrices are sparse enough and if a large percentage of the reference blocks are aligned at least in one direction. 
Specifically, these authors introduced three parameters: the reciprocal of the fraction of nonzero coefficients p. the frac- 
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tion a 1 of reference blocks aligned in one direction, and the fraction a 2 of completely unaligned reference blocks. 

Consider first the worst case situation in terms of block alignment, i.e. a^=0 and a 2 =1. The above definition of 
sparseness corresponds to p=4. Chang eta/, provide exact formulas for the number multiplications and additions asso- 
ciated with their approach in terms of c^, a 2 , P and the block size N (A/=8 in MPEG). According to these formulae, 16 

5 multiplications per pixel and 19 additions per pixel are required. To compare with generic microprocessor operations, 
assume that on the average every multiplication requires up to 4 SHIFTs and 3 ADDS, and that SHIFTs and ADDs can 
be performed simultaneously. This means that a conservative estimate of the total number of operations per block is (16 
x 3+19) x 64=4288 operations, which is much larger than 1728 operations (see above) in the herein disclosed scheme 
under the same circumstances. 

io As another point of comparison, note that a uniform distribution over w and h herein corresponds to 
0^=14/64=0.219 and <x 2 =49/64=0.766, which is more pessimistic than the upper curve in Fig. 5 of Chang et al. ibid, 
where a-|=0.2 and a 2 =0. 1 . Nevertheless, for p=l , the scheme herein disclosed makes it possible to speedup the com- 
putations by a factor of 4320/2300.5=1.87 compared to 0.6 in Chang et al. ibid, and for p=4 the speedup is 
4320/1397.2=3.13 compared to approximately 2.0 in Chang et al. ibid. Furthermore, if it is assumed that 0^=0.2 and 

is <x 2 =0. 1 , then it is possible to obtain speedup factors of 9.06 for p=1 and about 15 for p=4, which means an improvement 
by an order of magnitude compared to Chang et al. ibid 

Although the invention is described herein with reference to the preferred embodiment, one skilled in the art will 
readily appreciate that other applications may be substituted for those set forth herein without departing from the spirit 
and scope of the present invention. Accordingly, the invention should only be limited by the Claims included below. 

20 

Claims 

1 . A method for achieving inverse motion compensation in the compressed domain, where such information has b en 
compressed in accordance with a discrete cosine transform (DCT) based compression scheme, the method com- 

25 prising the steps of: 

Huffman decoding a compressed bitstream to extract a plurality of N x N DCT-based data blocks defining a pic- 
ture T-1 and to extract a motion vector h,w; 

processing n of said data blocks in accordance with at least one sampling matrix and with said motion vector 
30 to produce a single N x N DCT-based data block corresponding to a picture T, wherein n is used to define a 

region occupied by said picture T-1, and wherein said single data block is a combination of a data block con- 
sisting of a determined average of said n data blocks. 

2. The method of Claim 1 , said processing step further comprising the step of: 

35 

determining a DCT X of a current block 

x=f + e 

40 from a given DCT E of a prediction error e of the current block, and DCTs X 1t ...,X 4 of x u ... t x 4 respectively, 

where 

X=X+E 

45 

and X is a DCT of such that X is determined directly from X U ...,X 4 . 

3. The method of Claim 1 , further comprising the step of: 
so precompiling fixed matrices: 

J-UjfAA^). f=l,2,...,8 

55 and 

K^L^A^). i=l,2 8, 
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where 0 1 and Q 2 are N x N DCT-based matrices; 
where matrices U 1 = SO ? S" r and K 1 = S0 2 S* 7 ; 
where a factorization S is represented as follows: 

S = DPB 1 B 2 MA 1 A 2 A 3 , 

where D is a fixed diagonal matrix, 
where P is a fixed permutation matrix; and 
where B 1 B 2 MA 1 A 2 A 3 are fixed N x N matrices. 

The method of Claim 3, further comprising the step of: 

determining X in accordance with: 



K t _ k B&P *D\?fiPB x B/ 9 +XpPB x BJC^\ 



S' 



The method of Claim 3, further comprising the step of: 
determining X in accordance with: 



X=S 



\f h BftP t DX J +K k _ t Bftp 'DxfcpBMl ♦ 
(VX? 'DX 2 +K h _ t BftP 'DX 4 )dpB } B 2 KU 



The method of Claim 3, further comprising the step of: 
determining X in accordance with: 

X^DPBfi/sXfiPBMlJp* 



An apparatus (13) for achieving inverse motion compensation in the compressed domain, where such information 
has been compressed in accordance with a discrete cosine transform (DCT) based compression scheme, the 
apparatus comprising: 

a Huffman decoder (12) for decoding a compressed bitstream to produce a plurality of N x N DCT-based date 
blocks defining a picture T~1 and to produce a motion vector h,w; 

a processor (30) processing n of said data blocks in accordance with at least one sampling matrix and with said 
motion vector to produce a single N x N DCT-based data block corresponding to a frame T, wherein n is used 
to define a region occupied by said frame T-1, and wherein said single data block is a combination of a data 
block consisting of a determined average of said n data blocks. 

The apparatus (13) of Claim 7, said processor further comprising: 

means (40) for determining a DCT X of a current block 

x = i + e 

from a given DCT E of a prediction error e, and DCTs X h ... t X 4 of x 1 x 4 respectively, where 
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70 



15 
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30 
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X = X + E , 

and X is a OCT of a*, such that X is determined directly from X ? X 4 . 

9. The apparatus of Claim 8, further comprising: 
means for precomputing fixed matrices: 

J^ujfiA^). MA 8 

and 

where Q 1 and Q 2 are N x N DCT-based matrices; 
where matrices U 1 = S0 7 S" J and K 1 = SQ 2 S' 1 \ 
where a factorization S is represented as follows: 

S = DPB 1 B 2 MA 1 A 2 A 3l 



where 0 is a fixed diagonal matrix, 
where P is a fixed permutation matrix; and 
25 where B 1 B 2 MA 1 A 2 A 3 are A/ x N fixed matrices. 

10. The apparatus of Claim 9, further comprising: 

means for determining X in accordance with: 



K„B&P ^DPB X B/ W ^XfiPB.BjeU) 



S' 



1 1 . The apparatus of Claim 9, further comprising: 

40 

means for determining X in accordance with: 



J h B[B[P 'DX^fiXf 'DxjpPBW^ 



r 



so 1 2. The apparatus of Claim 9, further comprising: 

means for determining X in accordance with: 

55 X^DPB^XfiPB^Uy 



=BNSOOCil)r< fc v^^r, 
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(54) Fast DOT inverse motion compensation 

(57) Downsampling and inverse motion compensa- 
tion are performed on compressed domain representa- 
tions for video. By directly manipulating the compressed 
domain representation instead of the spatial domain 
representation, computational complexity is significantly 
reduced. For downsampling, the compressed stream is 
processed in the compressed (DCT) domain without 
explicit decompression and spatial domain downsam- 
pling so that the resulting compressed stream corre- 
sponds to a scaled down image, ensuring that the 
resulting compressed stream conforms to the standard 
syntax of 8 x 8 DCT matrices. For typical data sets, this 
approach of downsampling in the compressed domain 
results in computation savings around 80 % compared 
with traditional spatial domain methods for downsam- 
pling from compressed data. For inverse motion com- 
pensation, motion compensated compressed video is 
converted into a sequence of DCT domain blocks corre- 
sponding to the spatial domain blocks in the current pic- 
ture alone. By performing inverse motion compensation 
directly in the compressed domain, the reduction in 
computation complexity is around 68 % compared with 
traditional spatial domain methods for inverse motion 
compensation from compressed data. The techniques 
for downsampling and inv rs motion compensation 
can be used in a variety of applications, such as 
multipoint video conferencing and video editing. 
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